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1.1 


iversity 


The Open Un 


1 Greek alphabet 


a A Alpha 
B6 B Beta 

y T Gamma 
ô A Delta 

€ E Epsilon 
C Z Zeta 

n H Eta 

0 © Theta 


t I Iota p P Rho 

k K Kappa oe »& Sigma 
A A Lambda r T Tau 

u M Mu v T Upsilon 
v N Nu o Phi 

E & Xi x X Chi 

o O Omicron wy W Psi 

am ID Pi w Q Omega 


2 Notation 


General notation 


Niu, 07) 
M(A) 
U(a, b) 
B(n, p) 
Poisson (1) 


0 
g ar 


number of observations in a sample, or sample size 
data values in a sample 

summation sign 

sample mean 

sample median 

sample standard deviation, sample variance 
probability density function of X 

probability mass function of X 

probability density function 

probability mass function 

expectation or mean of X 

variance of X 

a-quantile 

is approximately equal to 

is distributed as 

has approximately the same distribution as 
normal distribution with mean u and variance o? 
exponential distribution with parameter A 


continuous uniform distribution on the interval a < x < b 


binomial distribution with parameters n and p 
Poisson distribution with parameter ju 

estimate or estimator of a parameter 0 

lower and upper confidence limits for 0 

null and alternative hypotheses 

significance probability 

sample covariance of observations on X and Y 
conditional probability that Y = y given that X = x 
Pearson correlation coefficient 


Medical statistics 


P(D\E) 
P(D\|not E) 
RR 


probability of disease D, given exposure E 

probability of disease D, given no exposure E 

relative risk 

odds ratio 

entries in a 2 x 2 table for a cohort or case-control study 
numbers exposed and not exposed in a cohort study 
numbers with and without disease in a case-control study 


odds ratio for exposure category 7 relative to the reference exposure category, 
or odds ratio for stratum 7, or odds ratio for dose level 7 relative to the lowest dose 


observed value for the ith cell of a contingency table 

expected value for the ith cell of a contingency table 

chi-squared distribution on v degrees of freedom 

test statistic for the chi-squared test for no association and McNemar’s test 
Mantel-Haenszel estimate of the common odds ratio 

numbers of discordant pairs in a 1-1 matched case-control study 

randomized controlled trial 

significance level (for sample size calculation) 

power (for sample size calculation) 

design values for the treatment group (T) and control group (C) (for sample size calculation) 


Time series 


Xt 


Tt 


Qk 

PACF 

Zt 

AR(p) 
MA(q) 
ARMA(p, ai 


time series, or the random variable representing the value at time t in a time series 
observed time series, or the observed value at time t 
period of a cyclic time series 

trend component of a time series, or the level at time t 
seasonal component of a time series 

seasonal factors 

irregular (or random) component of a time series 
moving average centred on t (for smoothing) 

weighted moving average used for removing the seasonal component of a seasonal time series 
raw seasonal factor for season j 

1-step ahead forecast of Xn+1 

smoothing parameters for exponential smoothing 
1-step ahead forecast error 

sum of squared errors 

sample autocorrelation at lag k 

autocorrelation at lag k 

autocorrelation function 

partial autocorrelation at lag k 

partial autocorrelation function 

white noise 

autoregressive model of order p 

moving average model of order q 

autoregressive moving average model of order (p, q) 


ARIMA(p,d,q) integrated autoregressive moving average model of order (p, d. q) 


d 


order of differencing 


Multivariate analysis 


Corr (Xj, Er 
Yi, Yr 


dimension of a multivariate data set (number of variables) 
number of observations in a multivariate data set 

data matrix, with n rows and p columns 

jth column of a data matrix, containing values of the jth variable 
value of X; for observation i; (i, 7)th element of X 

vector with jth element yj 

sample mean of X; 
mean vector of X1,..., Xp 

sample variance of X; 

sample covariance between X; and X;, 

` Xp 

standardized (or group-standardized) variable 


covariance matrix of Xj,... 


correlation coefficient between A: and X; 
first and kth principal components of a data set 


loading of the first principal component, or of the first discriminant 
function, for the jth variable 


loading of the kth principal component, or of the kth discriminant 
function, for the jth variable 

loadings vectors 

total variance 

percentage variance explained 

cumulative percentage variance explained 

number of groups 

number of observations in group g 

total number of observations in all groups, N = nı +---+n@ 
for grouped data: mean of X for group g 


grand mean of X 

for grouped data: sample variance of X for group g 

within-groups variance 

between-groups variance 

within-groups covariance of X; and X; 

between-groups covariance of X; and X; 

within-groups covariance matrix 

between-groups covariance matrix 

first and Ath discriminant functions 

separation achieved by the linear combination D 

loading for a discriminant function based on group-standardized variables 
percentage separation achieved by the jth discriminant function 
cumulative percentage separation achieved by the first j discriminant functions 
cutpoints for an allocation rule involving G groups 


Bayesian statistics 


P(A) probability of event A 

P(A|B) probability of A given B 

f(0) prior density of 0 

L(0) likelihood of 0 given data 

data the parameter 6, conditional on data 

f (0|data) posterior density of 0 

N (a,b) normal prior with mean a and variance b 
Beta(a, b) beta prior with parameters a and b 
Gamma(a,b) gamma prior with parameters a and b 

U (a,b) uniform prior on the interval a < x < b 

T precision o~?, the reciprocal of the variance 
(L,U) equal-tailed 100(1 — a)% interval for a parameter 6 as used to specify a prior density 
(1, u) 100(1 — a)% credible interval for a parameter 0 
HPD highest posterior density 

N number of samples drawn in a simulation 

MC Monte Carlo 

MCMC Markov chain Monte Carlo 


3 Table of discrete probability 
distributions 


Name Notation Typical use Range Probability mass Mean Variance 
function p(x) 








Binomial  B(n,p) Total number of successes in (In (") p*(1—p)"* np np(1 — p) 
n independent Bernoulli trials f 
ech 
Poisson ` Dotssontu) Counts of independently Oa rarer K L u u 
occurring events = 
1 EW 2_ 4 
Discrete Equally likely events labelled UE — £ - SC 
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uniform lton 
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5 Outlines 


5.1 Background material from the Introduction to 
statistical modelling 


Graphical and numerical summaries 


1 Useful graphical representations of statistical data include bar charts, 
histograms and scatterplots. Bar charts are generally used with 
categorical data, or discrete numerical data. Histograms are generally 
used with continuous data, by grouping the data into intervals or bins. 
Scatterplots are used to display the relationship between two numerical 
variables. 


2 Measures of location include the mean, median and mode. If the n items 
in a data set are denoted 21, ®£2,..., £n, then the sample mean, which is 
denoted 7, is given by 


1 
F= 7 (ti + z2 +++ + En) = SE 
i= 
3 The median of a sample of data with an odd number of values is the middle 
value of the data set when the values are placed in order of increasing size. If 
the sample size is even, then the median is halfway between the two middle 
values. 


4 The mode of a set of categorical data is the most frequently occurring 
(or modal) value. The term mode is also used to describe a clear peak in a 
histogram or a bar chart of a set of numerical data. 


5 Measures of dispersion describe the variation within a sample around its 
average value. They include the standard deviation and the variance. If the 
n items in a data set with sample mean 7 are denoted x1, %2,...,%n, then 
the sample standard deviation, denoted s, is given by 





The quantity s? is known as the sample variance. 


6 The skewness of a sample is a measure of departure from symmetry. If the 
data are symmetrically distributed around the median, then the skewness is 
zero. If there is a relatively long tail of values to the right of the median, 
then the data are said to be right-skew, or positively skewed. If there is a 
relatively long tail of values to the left of the median, then the data are said 
to be left-skew, or negatively skewed. 


Probability models 
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A probability model for a continuous random variable X is specified by the 
probability density function (p.d.f.) f(x) of the random variable. 

A probability model for a discrete random variable X is specified by the 
probability mass function (p.m.f.) p(x) of the random variable. Details of 
specific p.d.f.s and p.m.f.s are given in the tables in Sections 3 and 4 of this 
Handbook. 


The population mean of a random variable X is denoted u or E(X); it is 
also called the expectation or expected value of X. The population 
variance of X is denoted o? or V(X); it is equal to E(X — ul, The 
population standard deviation is ø. 


The a-quantile of a continuous random variable X is the value qa such that 
a= P(X < qa). 


The population median of X is the 0.5-quantile. The lower quartile of X 
is the 0.25-quantile, and the upper quartile of X is the 0.75-quantile. 


The central limit theorem states that if n independent random 
observations are taken from a population with mean u and variance o", then 
for large n the distribution of their mean ji (also called the sampling 
distribution of the mean) is approximately normal with mean u and 
variance o? dm. The standard deviation of the sampling distribution, which is 
equal to a/,/n, is called the standard error of 1. 


Confidence intervals 
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12 


13 


14 


A 100(1 — a)% confidence interval (u~, +) for a population mean p, 
calculated from a sample of size n with sample mean 7, may be used to 
represent the uncertainty in the estimate T of u. The confidence interval may 
be interpreted in two ways — using the repeated experiments 
interpretation (based on a large number of repetitions of the experiment with 
samples of size n), and using the plausible range interpretation (based on 
the probability of observing a sample mean as extreme as T, if u were to take 
values outside the confidence interval). These interpretations are equivalent. 


Given a random sample of size n from a population with mean pu, an 
approximate 100(1 — a)% confidence interval for p is given by the 
z-interval 


Tt ag = ge, ER D 
yn yn 
where ji is the sample mean, s is the sample standard deviation, and z is 
qı—a/2, the (1 — a/2)-quantile of the standard normal distribution. 


An approximate 100(1 — a)% confidence interval for a parameter @ is 
given by the z-interval 


(0-,0+) = (0-26, 6+ 2), 


where d is the sample estimate of 6, oe is the estimated standard error of the 
estimator 0, and z is q1—a/2, the (1 — a/2)-quantile of the standard normal 
distribution. 


When @ is a binomial proportion p, 0 is its sample estimate p and the 
standard error of D may be estimated by 
BE P) 

SCH 


T= 


Table 2 of the statistical tables 
contains quantiles for the 
standard normal distribution. 


Table 2 of the statistical tables 
contains quantiles for the 
standard normal distribution. 


Significance tests 
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A significance test may be used to evaluate the strength of evidence 
against a null hypothesis Ho of the form 


Ho: 0 = A. 
The corresponding alternative hypothesis Hi is 
Ay: 0 SÉ A. 


The strength of evidence against Ho is quantified by the significance 
probability or p value. The procedure for carrying out a significance test is 
as follows. 


© Determine the null hypothesis Hp and the alternative hypothesis AH. 


© Choose a suitable test statistic and determine the null distribution of the 
test statistic. 


© Calculate the observed value of the test statistic and identify the values 
that are at least as extreme as the observed value in relation to Ho. 


© Calculate the significance probability p. 


© Interpret the significance probability and report the results. 


The following table provides a rough guide for interpreting p values. 


Significance probability p Rough interpretation 


p> 0.10 little evidence against Ho 
0.10 > p > 0.05 weak evidence against Ho 
0.05 > p > 0.01 moderate evidence against Ho 
p < 0.01 strong evidence against Ho 


Correlation and association 
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20 


21 


Two random variables are said to be related, or associated, if knowing 
something about the value of one variable tells you something about the 
value of the other. 


A measure of the strength of a linear association is provided by the (Pearson) 
correlation coefficient. This is based on the sample covariance. For 
observations (#1, 41), (£2, Y2), ---, (n, Yn) on two random variables X and Y 
with sample means 7 and y and sample standard deviations sx and sy, the 
sample covariance is 


and the correlation coefficient is 
C l 
p — vy) 
SxSy 
Conditional probabilities are probabilities of the form ‘probability that 
Y = y, given that X = x’, and are written 
DIN =y|X =2). 
Two discrete random variables X and Y are independent if, for all values of 
x and y, 


P(Y =y|X =z) = P(Y = y). 








If X and Y are not independent, they are said to be dependent, or related, 
or associated. 


5.2 Medical statistics 


Cohort and case-control studies 


1 


10 


A cohort study of the association between an exposure E and a disease D 
typically includes one group with exposure E (the exposed group) and one 
group without exposure E (the control group). The groups are followed 
over time and the occurrences of disease D in each group are identified. 


A case-control study of the association between an exposure E and a 
disease D typically includes a group of cases with the disease D and a group 
of controls without the disease D, who are otherwise comparable to the 
cases. The past exposures of the cases and controls are determined and the 
occurrences of exposure F are identified. 


The risk of disease D, given exposure E, is P(D|E). The relative risk is 
P(DIE 
pee OH 
P(D\|not E) 
The odds of disease D, given exposure E, is 
P(DIE) 
Pinot DIE 
The odds ratio is 
OR= P(D|E) x Pinot D|not E) 
~ P(not DIE) x P(D|not Ei 


OD(D|E) = 


Data from a cohort study may be presented in a table as follows. 


Disease outcome 
Exposure category D not D Total 


E a b ny 
not E c d nə 


The sample estimate of the relative risk RR from a cohort study is 


matm, 
c/na 





An approximate 100(1 — a)% confidence interval for the relative risk RR is 


(Rb RR”) = (RR x exp(—20), RR x exp(20)), 


where z is the (1 — a/2)-quantile of the standard normal distribution, and Table 2 of the statistical tables 
contains quantiles for the 
== 1 8 1 1 8 1 standard normal distribution. 


Data from a case-control study may be presented in a table as follows. 


Disease outcome 
Exposure category D (cases) not D (controls) 


E a b 
not E c d 
Total mı ms 


The sample estimate of the odds ratio OR from a case-control study or a 
cohort study is 





An approximate 100(1 — a)% confidence interval for the odds ratio OR is 
(OR~,OR*) = (OR x exp(—2z@), OR x exp(z@)), 


where z is the (1 — a/2)-quantile of the standard normal distribution, and 


In studies with more than one exposure category, one category is chosen as 
the reference exposure category and calculations are undertaken relative to 
this reference category. 


When data are arranged in an r x c table, an approximate test for no 
association between the variables uses the chi-squared test statistic 


2_ 5 (0: - E}? 
X — 5 E; H 
where the sum is taken over all r x c cells of the table, O; is the observed 


frequency for the ith cell, and E; is the expected frequency for the ith cell. 
The expected frequency for a cell is given by 
row total x column total 
expected frequency = ——_———— 
overall total 


When the null hypothesis of no association is true, 


x ~ x (r = 1)(e- 1)). 


The approximation is adequate provided that all the expected frequencies are 
at least 5. When this is not the case, Fisher’s exact test can be used. 


Bias, confounding and causation 
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A study is biased if some aspects of the design, sampling, data collection or 
analysis method produce results that systematically overestimate or 
underestimate the strength of association. In particular, bias may arise from 
selection bias, information bias or confounding. 


Confounding may arise if both the exposure E and the disease D are 
associated with a third variable C, known as a confounder. Confounding 
bias may be explored by stratifying the data according to the levels of the 
confounder. 


Table 2 of the statistical tables 


contains quantiles for the 
standard normal distribution. 


Table 3 of the statistical tables 


contains quantiles for 
chi-squared distributions. 
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Data from stratum 7 of a cohort study or case-control study stratified 
according to the levels of a variable C may be presented in a table as follows. 


Exposure category Disease/Cases No disease/Controls 
Exposed aj b; 
Not exposed Ci d; 


If the underlying stratum-specific odds ratios are the same for all strata, then 
their common value OR is estimated by the Mantel-Haenszel odds ratio: 


3 ` aid; AN 
Y bicai / N; 


where N; = a; + bi + ci + di, and the summations are over all the strata. 


ORmH = 


In a matched case-control study, the controls in each matched 
case-control set are selected so that they match the case with respect to 
the confounding variables. 


The case-control pairs from a 1—1 matched case-control study may be 
presented in a table as follows. 


Controls 
Exposed Not exposed 


Exposed e f 
Cases 
Not exposed g h 


The Mantel-Haenszel estimate of the odds ratio is 
Run = L. 
g 
An approximate 100(1 — a)% confidence interval for the odds ratio is 
(OR-,OR*) = (ORmn x exp(—z@), OR x exp(z@)), 


where z is the (1 — a/2)-quantile of the standard normal distribution, and 


2 D? 
o=,4/s++-. 
fog 


McNemar’s test for no association in a 1-1 matched case-control study is 
based on the test statistic 


e (If-—gl-— 1}? 
x= fg 


Under the null hypothesis of no association, x? ~ x?(1). 


The presence of an interaction between a stratifying variable C and the 
association between an exposure E and a disease outcome D may be 
investigated using a significance test of homogeneity. 


If there are k strata, the null hypothesis is OR; = OR2 =--- = ORx, where 
OR; is the odds ratio for stratum 7. Tarone’s test for homogeneity is 
based on a test statistic whose distribution is approximately x?(k — 1) under 
the null hypothesis. 


Association does not imply not causation. Bradford Hill’s criteria for 
causation may help in assessing whether an association is causal. 


A dose is a quantified exposure. A dose-response relationship exists 
between an exposure E and a disease D if the risk (or odds) of disease varies 
according to the dose of that exposure. 


Table 2 of the statistical tables 
contains quantiles for the 
standard normal distribution. 


Table 3 of the statistical tables 
contains quantiles for y?(1). 


Table 3 of the statistical tables 
contains quantiles for 
chi-squared distributions. 
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The presence of a dose-response relationship may be investigated using the 
chi-squared test for no linear trend. The null hypothesis for this 
significance test is that the log odds of disease does not increase or decrease 
linearly with the dose. Under the null hypothesis, the distribution of the test 
statistic is approximately y7(1). 


Randomized controlled trials and the medical literature 
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20 
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24 
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A randomized controlled trial is a cohort study in which participants are 
randomly allocated to treatment and control groups. Stratified 
randomization, in which participants are randomized by blocks, may be 
used to improve balance in the characteristics of the patients allocated to 
the different groups. Bias is further reduced by using concealment 
procedures such as double blinding or single blinding. 


The flow chart of the trial documents the numbers of participants included 
and excluded at each stage of the trial. The recommended method of 
analysis of randomized controlled trials is by intention to treat. In an 
intention-to-treat analysis, the groups analysed are as close as possible to 
those randomized. An alternative method of analysis is per protocol. In a 
per-protocol analysis, only participants who complete the treatment to which 
they were randomized are included. 


Pharmaceutical drugs are evaluated in clinical trials. The evaluation 
progresses through four phases. Phase III studies are always randomized 
controlled trials. An independent Data Monitoring Committee reviews 
the data and can halt a trial on ethical grounds. 


The sample size required for a randomized controlled trial to compare the 
effect of treatment on a disease D is derived within the framework of 
fixed-level testing. The null and alternative hypotheses may be written as 


Ho:pr=pc, Hı:pr # po, 


where pr is the probability of disease in the treatment group, and pc is the 
probability of disease in the control group. 


A Type I error is said to occur if the null hypothesis Ho is rejected when it 
is true. A Type II error is said to occur if the null hypothesis Ho is not 
rejected when it is false. 


The significance level of the test, a, is the probability of a Type I error. 
The power of the test, y, is the probability of avoiding a Type II error. 


To calculate the sample size for a trial with two groups of equal size, the 
design values mr and ze, the significance level o and the power y must be 
specified. The sample size n for each trial group is given approximately by 


2(q1—a/2 + q) To(1 — 70) 
ar as ae 
(wr — Tc) 


where q1—a/2 and q} denote, respectively, the (1 — a/2)-quantile and the 
7-quantile of the standard normal distribution, and zu = (mr + 17c)/2. 


The power y available in a trial with two groups each of size n is obtained 
from q}, the 7-quantile of the standard normal distribution, which is given by 
the expression 


n 


qy = [rr — zc j = li—a/2: 


Zoll — Ný 


The notation in this expression is the same as that used in 24. 


Table 3 of the statistical tables 


contains quantiles for y?(1). 


Table 2 of the statistical tables 


contains quantiles for the 
standard normal distribution. 
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26 Evidence from all available studies, or all available studies of a particular 
type, may be reviewed together as part of a systematic review. The 
selection of studies in such a review is particularly important in order to 
avoid publication bias. Sometimes a quantitative assessment of the 
strength of evidence from several studies may be possible by combining their 
results in a meta-analysis. 


27 Ina meta-analysis, the results of several studies are combined to obtain a 
pooled odds ratio and confidence interval, for example using the 
Mantel—Haenszel odds ratio (see 11). The presence of heterogeneity 
between studies may be investigated using Tarone’s test for homogeneity 
(see 15). A forest plot is used to display the results of a meta-analysis. 


28 Medical papers often contain statistical analyses. A typical medical paper 
includes the following sections: Abstract, Introduction, Methods, 
Results, Discussion. 


5.3 Time series 


Decomposition models 


1 A time series is a collection of observations X, on some random variable X 
at equally-spaced times 1,2,...,t,t+1,.... A time plot is a graph of the 
observed values x, against t. 


2 A cycle isa regular pattern that repeats at fixed intervals. The time interval 
between cycles is the period. A cycle whose period is determined by the 
natural clock is seasonal. A seasonal cycle with period one year is annual. 
Seasonality may be displayed using a seasonal plot. 


3 The additive decomposition model for a time series X; is 
Xt mr + st + Wi, t= l ee 


where m; is the trend component, s+ is the seasonal component of 
period T, and W; is the irregular (or random) component, sometimes also 
described as noise. The seasonal component satisfies 


St = Sit for allt, 
out +sr=0. 
The distinct values s,,...,57 are the seasonal factors. 
The irregular component W, has mean 0 and variance oi: 
E(W.)=0, VI =o. 


4 The multiplicative decomposition model for a time series X, which 
takes only positive values is 


Xt = mt X s X Wa. 
The seasonal component s+ satisfies 


St = Sit for allt, 


S1 X S2 X-X GER 


A The simple moving average of order 2q + 1 centred on t is given by the 
transformation 


1 
MA(t) = FP ce aa a 


6 A weighted moving average of order 2q + 1 has the form 
M A(t) = Q_gXt-—q +++ a—1X—ı + ao Xt + out +. + Oe X ttg 
where the weights aj, j = —q,—q + 1,...,q, add up to 1. 
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Simple and weighted moving averages may be used for smoothing a time 
series. The order of the moving average should be chosen so as to avoid 
both over-smoothing and under-smoothing the time series. 


For a seasonal time series X;, which may be described by an additive model, 
and for which the seasonal period is T (an even number), the seasonal 
component s; may be estimated as follows. 


First, the series is smoothed using a suitable weighted moving average SA(t). 
Then the series of differences e = 2, — SA(t) is obtained, and the raw 
seasonal factors Fj, j = 1,...,T, are calculated by averaging the values yj 
for each season. Finally, the seasonal factors are estimated by 


S=R=F, j=1,...,T, 
where F is the average of the raw seasonal factors. 


A time series is seasonally adjusted if its seasonal component has been 
estimated and removed, leaving only a trend component and an irregular 
component. 


Forecasting 
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Forecasting is the process of predicting future values of a time series based 
on the past values of the time series. A forecast Zait of Kai based on 
Ln,Ln—-1,ln—2,--. is called a 1-step ahead forecast of Xn+1. 


If a time series X; is described by an additive model with constant level and 
no seasonality, then 1-step ahead forecasts may be obtained by simple 
exponential smoothing using the formula 


n41 = Alyn + (1 — a)fn, 


where £n is the observed value at time n, Zp and Zu) are the 1-step ahead 
forecasts of Xn and Xn+1, and a is a smoothing parameter, 0 <a <1. 
The method requires an initial value 71. 


The 1-step ahead forecast error is the difference between the observed 
value and the 1-step ahead forecast of X;: es = 2; — T+. The sum of 
squared errors, or SSE, is given by 


SSE = Se SECH - È). 
t=1. t=1 


If a time series X; is described by an additive model with a linear trend 
component and no seasonality, then 1-step ahead forecasts may be obtained 
by Holt’s exponential smoothing. There are two smoothing parameters: 
a for the level and y for the slope. 


If in addition the time series has a seasonal component, forecasts may be 
obtained using Holt—Winters exponential smoothing. There are three 
smoothing parameters: a for the level, y for the slope and 6 for the seasonal 
component. 


For all exponential smoothing methods, optimal values of the smoothing 
parameters are obtained by minimizing the SSE. 


Suppose that X; is a time series with n observed values 21, £2,..., ro, The 
time series lagged by k places is the time series with X;_, in position k. 
The first k positions of the lagged series comprise missing values. 


The sample autocorrelation at lag k is a correlation coefficient ry 
calculated between a time series and a copy of itself, lagged by k places. It is 
calculated using the n — k pairs of points (x1, 241), (£2, rz, 

(Tak; tn) ` 
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The population autocorrelations p,, k = 1,2,..., define the autocorrelation 
function, or ACF. Under the null hypothesis o, = 0, the distribution of the 
sample autocorrelation calculated from a time series with n time points is 
approximately normal with mean 0 and variance 1/n. 


The sample autocorrelations may be displayed on a correlogram or sample 
ACF plot. Significance bounds are horizontal lines plotted at positions 
+1.96/./n on the correlogram. 





For a fixed number k of lags, the null hypothesis 





Ho: py = Pg = = Pp H 


may be tested using a portmanteau test such as the Ljung—Box test. 





A 100(1 — a)% prediction interval for X;,+41, given observed values up to 
and including xn, is an interval with probability 1 — a of containing Xn+1- 


Suppose that a l-step ahead forecast Za) for Xn+ı has been obtained, 
together with the SSE, the sum of squared forecast errors at times 1,2,...,n. 
An approximate 100(1 — a)% prediction interval for X,,41 is given by 


(= [SSE / (=). 
Tn+1 7 Z , Engi + 
n 


where z is the (1 — a/2)-quantile of the standard normal distribution. The 
assumptions required are that the forecast errors are normally distributed 
with mean zero and constant variance, and that the autocorrelations between 
the forecast errors are zero at lags k > 1. 


A time series Z; is said to be white noise if Z; is normally distributed with 
mean zero and constant variance o”, and the autocorrelations at all lags 
k > 1 are zero. 


ARIMA models 
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A time series X; is stationary in mean if it has constant mean, E(X+) = w. 
It is stationary in variance if it has constant variance, V (X+) = 07. It is 
stationary in correlation if for all k, o, the autocorrelation between A: 
and X-k, depends only on the lag k. The time series is stationary if it is 
stationary in mean, in variance and in correlation. 


The partial autocorrelation at lag k, az, is a measure of the direct 
dependence between X; and A: tc The partial autocorrelations az, 

k =0,1,2,..., define the partial autocorrelation function, or PACF. 
The partial correlogram, or sample PACF plot, is a bar chart of the 
sample PACF. 


Let X; be a stationary time series with mean u. The autoregressive model 
of order p, or AR(p) model, has the form 


Xi — p= By(Xe-1 — H) + Bo(Xe-2 — H) +++ + By(Xt-p — H) + Ze, 


where 61, b2,- .-, 6p are parameters to be estimated, and Z; is white noise 
with mean 0 and variance o°. 


The ACF for an AR(1) model is given by p, = @% for k > 0. The ACF for an 
AR(p) model tails off to zero in magnitude, either exponentially or in a 
damped sinusoidal pattern, as the lag increases. 


The PACF for an AR(p) model satisfies ap = D and ax = 0 for lags k > p. 


Let X; be a stationary time series with mean u. The moving average 
model of order q, or MA(q) model, has the form 


Xi = w= 4-4 N-1 — +++ hta 


where 61, 02,...,0q are parameters to be estimated, and Z; is white noise 
with mean 0 and variance o°. 


Table 2 of the statistical tables 
contains quantiles for the 
standard normal distribution. 
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The ACF for an MA(q) model satisfies 
= 04 
oa ae 
and pp = 0 for k > q. 
The PACF for an MA(q) model tails off to zero in magnitude, either 
exponentially or in a damped sinusoidal pattern, as the lag increases. 


Let X; be a stationary time series with mean zero. The autoregressive 
moving average model of order (p,q), or ARMA (p, q) model, has the 
form 


Xi — p= By(Xe-1 — pw) +--+ + By Xt-p — H) + Zt — 01 Zp-1 — +++ Beta 


An integrated moving average model of order (p,d,q), or 
ARIMA (p, d ai model, is an ARMA(p, q) model applied to a time series 
after differencing of order d. 


The key features of ARMA models are summarized in the table below. 


Model Notation ACF PACF 

White noise ARMA(0,0) Zero at lags >0 Zero at lags > 0 
Autoregressive ARMA(p,0) Tails off to zero Zero after lag p 
Moving average ARMA(0,q) Zero after lag q Tails off to zero 
Mixed ARMA(p,q) Tails off to zero Tails off to zero 


The principle of parsimony in selecting an ARIMA model is to keep the 
value of p + q to a minimum. 


The steps involved in selecting an ARIMA model for a non-seasonal time 

series are as follows. 

© Check than an additive model is appropriate. If it is not appropriate, 
then transform the series to obtain a series that can be represented by 
an additive model. 

© Identify the order of differencing, d, required to obtain stationarity. 

© Identify those ARIMA (p, d. al models that are consistent with the 
correlogram and partial correlogram for the stationary series. 


© Choose the model(s) with the lowest value of p + q. 


32 After fitting an ARIMA model, its adequacy should be checked, as follows. 


© Check the fit of the model by plotting the time series and the 1-step 
ahead forecasts on a multiple time plot. 


© Verify that the distribution of the forecast errors is approximately 
normal with mean zero and constant variance. 


© Use the correlogram for the forecast errors and the Ljung—-Box 
test (see 18) to check that the forecast errors are uncorrelated. 


5.4 Multivariate analysis 


Describing and displaying multivariate data 


1 


A multivariate data set comprises observations on two or more random 
variables. A bivariate data set has two variables. The number of 

variables, p, is the dimension of the data set. An observation is the set of p 
measurements made on one sampled unit. The variables X1,..., Xp form the 
columns of the n x p data matrix X, where n is the number of observations. 


Multivariate data may be displayed using two-dimensional scatterplots, 
three-dimensional scatterplots, matrix scatterplots and profile plots. 
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The mean vector for a data set with n observations and p variables 
X\,...,Xp is X = (T1,..., Zp), where T; is the sample mean of Xj, 


n 
= 1 
lj =- Tij. 
Jj n ) J 
J= 


The sample covariance between X; and Kris 


n 


3 (vig — Ti) (Tix — Tr). 


i=1 


1 


n— 1 





Sik = 


The covariance between a variable X; and itself is the sample variance 
of Sei that is, Sjj = s2. 
The variance-covariance matrix, or covariance matrix, of X1,..., Xp is 
a square matrix S with p rows and p columns. Element (j,k) of S is sjg, the 
sample covariance between X; and Xp. The diagonal element (j, j) is ai, the 
sample variance of X;. 


In standardization, each variable X; is transformed separately in such a 
way that the transformed variable Z; has mean 0 and variance 1. For 
observation 7, the value xij of X; is transformed to obtain the value zij of Zj, 
as follows: 


Žij = ———; 


where %; is the sample mean and s; is the sample standard deviation of X;. 


The numbers zo: do not have any units associated with them, so the 
standardized variable Z; is scale-free. 


The correlation matrix of X1,..., Xp is the covariance matrix of the 
standardized variables Z1,..., Zp. Element (j,k) is the correlation coefficient 
between X; and Ar, denoted Corr(X;, Xx). The diagonal elements of the 
correlation matrix are all equal to 1. 


Reducing dimension 
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Two approximations Yı and Y> to a multivariate data set are equivalent if 
constants cı Æ 0 and cg can be found such that Y> = c1 Yı + c2. 


For a data set of dimension p with variables X),..., Xp, the (first) 
principal component of the data, denoted Y, is the linear combination 


D 
He A a;(X;-X,), 
j=l 


where the loadings vector œ = (a1,...,@p) is chosen so that the variance 
of Y is maximized, subject to the constraint 


p 

2 — 
d aj =; 
j=1 


For a data set with p variables X1,..., Xp, the variance of the linear 
combination 


DH 
Vo A a;(X; -X;) 
j=l 


can be calculated from the variances and covariances of the original variables 
using the formula 


vV(Y)= 5 o: V(X;) +2 5 ajag Cov( Xj, Xp). 
j=1 j,k:k>j 
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For a multivariate data set with p variables X1, ..., Xp, the total variance, 
TV, is 
p 
TV=)_ Vi) 
j=l 


The percentage variance explained, PVE, by a linear combination Y is 


VO) 





PVE = x 100%. 


For a data set of dimension p with variables X,,..., Xp, the kth principal 
component of the data, denoted Mr. is the linear combination 


D 
Ye = oU- Ech 
j=l 


where the loadings vector a; = (@k1, . - -, @kp) is chosen so that the 
variance of Yp is maximized, subject to the following constraints: 


p 

2 
: Qkj = 1, 
j=l 


Y; is uncorrelated with Y,,..., Y,—1. 


In some circumstances, it is preferable, or even essential, to calculate 
principal components using standardized data. In this case, the kth principal 
component has the form 


D 
Ke = 5 kj Z;- 
j=l 


The cumulative percentage variance explained, CPVE, by the first k 
principal components is given by 
V(¥i) +- +V (Yk) 

TV 
Kaiser’s criterion for choosing the number of principal components is to 
retain components with variance greater than the average of the variances of 
the original variables. 


CPVE = x 100%. 


In a scree plot, the elbow is the point at which the plot flattens out. The 
point preceding the elbow indicates the last component to be retained. 


Discrimination 
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Suppose that a multivariate data set comprises observations on G groups, 
that ng is the size of group g, and that Z, is the mean of a variable X in 
group g. Let N denote the total number of observations in the G groups: 
N=n,+--:+ng. The grand mean of X is denoted "7 and is given by 


1 G 
== N 2 "To 


19 


17 


18 


19 


20 


21 


20 


Suppose that the variance of X in group g is s2. The between-groups 
variance of X, denoted ka, and the within-groups variance of X, 
denoted Vw, are given by 


1 





N 


Y= 
Ma = 





G 
=e 5 ng(Tg — T)’, 
g=1 
1 G 
2 
NG 3 In, = D 
g=1 
The separation achieved by a variable X is given by the ratio of the 
between-groups variance to the within-groups variance of X: 


Vo 
separation = —. 
w 

The within-groups covariance for a pair of variables X; and X;, which is 
denoted Cov, (X;,X;), is the weighted average of the covariances of X; and 
X; calculated for each of the groups separately. The between-groups 
covariance of variables X; and X;, which is denoted Cov, (Xi, X Ei is the 
covariance between the group means for X; and XK: The within-groups 
covariance matrix W has (i, j)th element Cov,,(X;,X;). The 
between-groups covariance matrix B has (i, j)th element Cov, (X;, X;). 


For a linear combination D of variables of the form 
P — 
D=%_a;(X;-X;), 
j=1 


the between-groups covariance of D, denoted V,( D), and the within-groups 
variance of D, denoted Vo (D), are given by 


P 
v (D) =X af Vi(X;) +2 Kä ajak Covy(X;, Xk), 
j=l j,k:k>j 


p 
VW) = 3 oi Vo(Xj) +2 X` ajar Covw(Xj, Xx). 
j=l j,k:k>j 
The separation achieved by D, denoted Sep(D), is the ratio of the 
between-groups variance of D to the within-groups variance of D: 
_ vum 
Va) 
In canonical discrimination, the (first) discriminant function D is the 
linear combination 


Sep(D) 





H — 
D= (X; -X;) 
j=1 


for which the separation is maximized, subject to a constraint on the 


loadings oi... os. Commonly used constraints are 
p 
Aar 
j=l 
and 
Va D) = 1; 


In canonical discrimination, the standardized version Z; of a variable X; is 
defined so that Z; has mean 0 and within-groups variance 1, using the 
formula 
DEE 
V Vw(X;) 


The variable Zj is called the group-standardized variable. 
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The discriminant function 
p = 
D=)-aj;(X; — EA 
j=l 
may be written in terms of the group-standardized variables as follows: 


D 
D= 5 ajZ;, 
j=l 


where the loadings a; are given by 
aj = ajy Vw( X3). 


The separation achieved by the discriminant function D is the same whether 
D is based on unstandardized or group-standardized variables. 


The kth discriminant function Dy, is the linear combination 
P = 
D=} onj(X; — Za 
j=l 


that maximizes the separation, subject to the within-groups covariance 
between D; and Dz_1,..., Dı being zero, and subject to a constraint on the 
loadings or: (see 20). The kth discriminant function may also be written in 
terms of group-standardized variables as follows: 


D 
Dp= d akifi; 
j=l 


with akj = AkjvV/ Vu (X;). 


The total separation is the sum of the separations achieved by all p 
discriminant functions: 


total separation = Sep( D1) +--+ + Sep( Dp). 


The percentage separation achieved by the discriminant function Dj, 
denoted PSA,, is 


Sep(D; 
psa; = SPO) 100%. 
total separation 
The cumulative percentage separation achieved by Dj,..., Dj, 


denoted CPSA,, is 
CPSA; = PSA; + +++ + PSA}. 


An allocation rule for G groups based on the discriminant function is 
defined by G — 1 cut-off points or cutpoints [,,...,/¢_—, such that 
li <l <---<la-1ı. The allocation rule is of the following form: 


ifd< l allocate to group 1, 
ifl <d< lə allocate to group 2, 


if lg-2<d<lg_; allocate to group G — 1, 
otherwise allocate to group G. 


In choosing the cutpoints, three factors must be considered. 


© For each group g, the probability density function of the values of 
the discriminant function for an observation randomly selected from all 
those known to be in group g. 


© For each group g, the prior probability that an observation randomly 
chosen belongs to group g. 


© For each pair of groups, the cost of wrongly allocating an observation to 
one group when it actually belongs to the other group. 
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27 In practice, it is often assumed that the distribution of values of D for group 
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g is normal with mean ug, and that the distributions for the groups have 
common variance. If the groups are numbered so that u4 Une: < Mea, 
then with the above assumption and under the assumptions of equal prior 
probabilities and equal costs, the cutpoints are given by 


lg = Sie + Hg+1)» Ce re ER 


The misclassification rate is the percentage of observations that are 
misclassified: 


i : ; number misclassified 
misclassification rate = ————————————. x 100%. 
number in sample 


Information on the way in which observations are misclassified is conveyed in 
a confusion matrix. When there are G groups, the confusion matrix has 

G rows and G columns, and element (i, j) is the percentage of observations in 
group 7 that were allocated to group j. 


5.5 Bayesian statistics 


The Bayesian approach 


1 
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The probability of an event may sometimes be estimated using the observed 
or hypothetical relative frequency of the event. If this is not possible, 
subjective estimates may be required. These represent the opinions and 
beliefs of the person making the estimate. 


For two events A and B, the conditional probabilities P(A|B) and P(B|A) 
are related by Bayes’ theorem: 


P(BIA) P(A) 


P(AIB) = 


where the probability P(B) may be obtained using the formula 
P(B) = P(B|A) P(A) + P(B|not A) Pinot, A). 


Bayes’ theorem provides a way of updating probabilities. In the absence of 
additional information, a prior probability is determined. Once additional 
information becomes available, the probability is revised to obtain the 
posterior probability. In sequential updating, this procedure is 
repeated several times. 


In Bayesian inference about a parameter 6, prior beliefs about 0 are 
represented by a prior distribution with probability density function f(0), 
called the prior density. A prior is said to be weak or strong according to 
how peaked it is, greater uncertainty about 0 corresponding to flatter priors. 


The information about a parameter 0 that is contained in observed data 
mi, Zn on a random variable X is represented by the likelihood 
function L/(0). 


Bayesian inference is based on the posterior distribution for 0, given the 
observed data, with posterior density f(0|data). This is obtained from the 
prior density f(0) and the likelihood L(@) using the expression 


f(O\data) x L(0) PL 
or, in words, 
posterior x likelihood x prior. 


The process of obtaining the posterior distribution and using it for inference 
is called prior to posterior analysis. 


Prior to posterior analyses 
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Standard distributions are often used to represent prior beliefs about a 
parameter 0. The normal prior N(a,b) may be used to represent beliefs 
about 0 that are symmetric about a single most likely value. 


© Mode = median = mean = a. 
© Variance = b. 


© Al values of 0 in the range —oo < 6 < oo are possible, but only those in 
the range a + 3vb are likely. 





The uniform prior U(a,b), with parameters a and b, may be used to 
represent the belief that the value of 0 lies between a and b when it is not 
known which values in the interval [a,b] are more likely than others. 


The uniform prior U (a,b) is noninformative if the interval [a,b] necessarily 
includes all values in the range of 0. Improper uniform priors may be used 
to represent lack of prior information about @ and its range. 


The beta prior with parameters a > 0 and b > 0, which is denoted 

Beta(a, b), may be used to represent beliefs about a proportion 0,0 <6 <1. 

© Whena > 1 and b> 1, the beta density has a single mode, given by 
a-1 

a+b—2° 

© When a < 1, the beta density has a mode at 0. When b < 1, it has a 


mode at 1. When a < 1 and b < 1, the density has two modes — at 0 
and 1. 


© The mean and variance of Beta(a,b) are given by 
a : ab 
So EE urarii 
© The larger the value of a + b is, the stronger are the beliefs represented 
by the beta prior. 


© The Beta(1,1) distribution is the same as the uniform distribution 
U (0,1). 


mode = 


The gamma prior with parameters a > 0 and b > 0, which is denoted 
Gamma(a, b), may be used to represent beliefs about a parameter H which 
takes only non-negative values. The parameter a is the shape parameter. 
© When a > 1, the prior has a single mode given by 
a—1 

5 
When 0 <a < 1, there is a single mode at 0. 





mode = 


© The mean and variance of Gamma(a, b) are given by 


mean = 7 variance = Se 

Three steps are involved in specifying a prior f(0). 

© Assess the location of f(@). 

© Assess the spread of f(0). 

© Calculate the values of a and b that give the assessed location and 
spread. 


Assessing the location of a prior for a parameter 0 is most readily based on 
the mode or median. The spread of the prior may be assessed using an 
equal-tailed 100(1 — a)% interval (L,U), where 


P(0 < L) = P(@>U) = ża. 
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The mean a and variance b of a normal prior may be chosen as follows: 


a = assessed mode or median, 


2 
b= (Z — d 
2z 
where L and U are the assessed values of the a/2-quantile and the 


(1 — a/2)-quantile of 0, respectively, and z is the (1 — a/2)-quantile of 
N(0,1). 





For some likelihoods, a prior can be used which produces a posterior 
distribution of the same form as the prior distribution. Such a prior is called 
a conjugate prior. When a conjugate prior is used, the prior to posterior 
Bayesian analysis is called a conjugate analysis. Some standard conjugate 
analyses are summarized in the table below; x is an observation on a random 
variable X, and % represents the mean of a sample of n observations on X. 





Table 2 of the statistical tables 
contains quantiles for the 
standard normal distribution. 


Name Prior Data Posterior 
beta/binomial @~ Beta(a, b) X ~ Bin, 8) Beta(a + x,b +n -— zx) 
gamma/Poisson u ~ Gamma(a,b) X ~ Poisson(u) Gamma(a + nT, b + n) 

2a + nbT 2b 
normal/normal p ~ N(a, 6) X ~ N(u, 07) N > H 


where o? is known 


Prior to posterior Bayesian analyses may be undertaken using 
noninformative or improper uniform priors. Some standard analyses are 
summarized in the table below; x is an observation on a random variable X, 
and T represents the mean of a sample of n observations on X. 


Name Prior Data Posterior 
uniform/binomial 6 ~ U(0,1) X ~ Bin, 6) Beta(1+2,1+n-— z) 
uniform/Poisson p ~ improper uniform X ~ Poisson(j) Gamma(nz, n) 
on [0, 00) 
a2 
uniform/normal pu ~ improper uniform X ~ Nu. 07) N E ei 
n 


on (—00, co) where o? is known 


A plot of the posterior distribution for a parameter 0 is always helpful. The 
location of the posterior distribution may be summarized conveniently by the 
posterior mode or the posterior median. The spread of the posterior 
distribution may be summarized by the posterior variance. Probabilities 
calculated from posterior distributions may also be of interest. 


An interval (l, u) is a 100(1 — a)% credible interval for a parameter 0 if 
the posterior probability that l < 0 < u, given the data, is equal to 1 — a: 


P(l < 0 < uldata) = 1 — a. 
The probability 1 — a is the credibility level of the interval. 


A Highest Posterior Density (HPD) credible interval for a posterior 
distribution with a single mode contains the most likely values of 0. An 
equal-tailed credible interval satisfies 


P(0 < I|data) = P(@ > uldata) = $a. 


Bayesian inference via simulation 
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When a conjugate analysis does not seem appropriate, or when the 
mathematics involved in using a conjugate analysis is complicated, 
simulation can be used to obtain information about the posterior 
distribution. Simulation is particularly useful in non-conjugate Bayesian 
analyses or when functions of parameters are of interest. 


Stochastic simulation, or Monte Carlo (MC) simulation, involves 
mimicking the properties of a distribution by ‘randomly’ sampling values 
from the distribution. 


The Monte Carlo standard error of a mean obtained by simulation, or 
the MC error, relates to the variability of the simulation, and may be 
reduced by increasing N, the number of values sampled in the simulation. 
The 5% rule of thumb states that N should be large enough to ensure that 
the Monte Carlo standard error of the mean is less than 5% of the sample 
standard deviation. 


To make inferences about a parameter o which is some function g(0) of a 
parameter 0 that can readily be simulated, proceed as follows. 


© Simulate N values of 0, denoted 01,..., 0N. 


© Apply the function g to each of the simulated values, to give values 
$1 = OI, bn = 9(On): 


© Use these values to make inferences about ¢. 


For a Bayesian analysis involving more than one unknown parameter, 
interest lies in the joint distribution and in the marginal distributions of the 
parameters. 


© The joint distribution TU, o) of two unknown parameters 0 and a 
describes how the two parameters vary together, and may be represented 
by a scatterplot of simulated pairs of values (0,41), ..., (@n, dy). 


© The marginal distributions are the distributions of 0 and a 
considered separately, and may be estimated using histograms of the 
simulated values 01,..., 0n and ¢,,...,@y, respectively. 


© The mean of the marginal distribution of a parameter can be estimated 
by the sample mean of the simulated values of the parameter; quantiles 
of the distribution can be estimated using sample quantiles. 


Markov chain Monte Carlo 
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A Markov chain is a sequence of random variables X1, X2,... for which the 
distribution of Ar depends only on the value of X; and not on any earlier 
values in the chain. A realization of a Markov chain may be represented 
using a trace plot, that is, a plot in which the values of the Markov chain 
are plotted against the iteration number. Under suitable conditions, the 
values in a realization of a Markov chain will eventually settle down, or 
converge, to an equilibrium distribution. 


Markov chain Monte Carlo (MCMC) is a technique for obtaining a 
posterior distribution of interest as the equilibrium distribution of a Markov 
chain. It is particularly useful when conjugate analyses are not available. 


Convergence of a Markov chain can be assessed graphically by running the 
Markov chain several times from different initial values and checking that the 
realizations eventually overlap. The period before they overlap is the 
burn-in period. Inferences can be based on all samples obtained after the 
burn-in period. 


Samples obtained using MCMC are dependent. However, the MC error can 
still be estimated and the 5% rule of thumb used to estimate the sample size 
to be used. 
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6 Statistical tables 


Table 1 Probabilities for the standard normal distribution, P(Z < z) 


z 0 1 2 3 4 5 6 7 8 9 


0.0 | 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 


0.2 | 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 
0.3 | 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 
0.4 | 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 
0.5 | 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 
0.6 | 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 
0.7 | 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 
0.8 | 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 
0.9 | 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 


1.0 | 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 


1.1 | 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 
1.2 | 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 
1.3 | 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 
1.4 | 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 
1.5 | 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 
1.6 | 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 
1.7 | 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 
1.8 | 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 
1.9 | 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 


2.1 | 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 
2.2 | 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 
2.3 | 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 
2.4 | 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 
2.5 | 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 
2.6 | 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 
2.7 | 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 
2.8 | 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 
2.9 | 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 


3.1 | 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 
3.2 | 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 
3.3 | 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 
3.4 | 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998 
3.5 | 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 
3.6 | 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 
3.7 | 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 
3.8 | 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 
3.9 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 


4.0 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 











Example: If Z ~ N(0,1), then P(Z < 0.62) = 0.7324. 
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Table 2 Quantiles for the standard normal distribution, P(Z < qa) =a 


0.50 0.00000 | 0.67 0.4399 | 0.84 


0.51 
0.52 
0.53 
0.54 
0.55 
0.56 
0.57 
0.58 
0.59 
0.60 
0.61 
0.62 
0.63 
0.64 
0.65 
0.66 





0.02507 | 0.68 
0.05015 | 0.69 
0.07527 | 0.70 
0.1004 | 0.71 
0.1257 | 0.72 
0.1510 | 0.73 
0.1764 | 0.74 
0.2019 | 0.75 
0.2275 | 0.76 
0.2533 | 0.77 
0.2793 | 0.78 
0.3055 | 0.79 
0.3319 | 0.80 
0.3585 | 0.81 
0.3853 | 0.82 
0.4125 | 0.83 





0.4677 
0.4959 
0.5244 
0.5534 
0.5828 
0.6128 
0.6433 
0.6745 
0.7063 
0.7388 
0.7722 
0.8064 
0.8416 
0.8779 
0.9154 
0.9542 





0.85 
0.86 
0.87 
0.88 
0.89 
0.90 
0.905 
0.910 
0.915 
0.920 
0.925 
0.930 
0.935 
0.940 
0.945 
0.950 





0.9945 | 0.955 
1.036 | 0.960 
1.080 | 0.965 
1.126 | 0.970 
1.175 | 0.975 
1.227 | 0.980 
1.282 | 0.985 
1.311 | 0.990 
1.341 | 0.991 
1.372 | 0.992 
1.405 | 0.993 
1.440 | 0.994 
1.476 | 0.995 
1.514 | 0.996 
1.555 | 0.997 
1.598 | 0.998 
1.645 | 0.999 


1.695 
1.751 
1.812 
1.881 
1.960 
2.054 
2.170 
2.326 
2.366 
2.409 
2.457 
2.512 
2.576 
2.652 
2.748 
2.878 
3.090 





Example: If Z ~ N(0,1), then P(Z < 1.645) = 0.950, so qo.95 = 1.645. 


Table 3 Quantiles for x?-distributions 


df 


LC OO Oo Gäbätz 








0.1 


0.016 
0.211 
0.584 
1.06 
1.61 
2.20 
2.83 
3.49 
4.17 
4.87 
5.58 
6.30 
7.04 
7.79 
8.55 
9.31 
10.09 
10.86 
11.65 
12.44 
13.24 
14.04 
14.85 
15.66 
16.47 
17.29 
18.11 
18.94 
19.77 
20.60 
21.43 
22.27 
23.11 
23.95 
24.80 
25.64 
26.49 
27.34 
28.20 
29.05 





0.3 


0.148 
0.713 
1.42 
2.19 
3.00 
3.83 
4.67 
5.53 
6.39 
7.27 
8.15 
9.03 
9.93 
10.82 
11.72 
12.62 
13.53 
14.44 
15.35 
16.27 
17.18 
18.10 
19.02 
19.94 
20.87 
21.79 
22.72 
23.65 
24.58 
25.51 
26.44 
27.37 
28.31 
29.24 
30.18 
31.12 
32.05 
32.99 
33.93 
34.87 





0.5 


0.455 
1.39 
2.37 
3.36 
4.35 
5.35 
6.35 
7.34 
8.34 
9.34 
10.34 
11.34 
12.34 
13.34 
14.34 
15.34 
16.34 
17.34 
18.34 
19.34 
20.34 
21.34 
22.34 
23.34 
24.34 
25.34 
26.34 
27.34 
28.34 
29.34 
30.34 
31.34 
32.34 
33.34 
34.34 
35.34 
36.34 
37.34 
38.34 
39.34 





0.6 


0.708 
1.83 
2.95 
4.04 
5.13 
6.21 
7.28 
8.35 
9.41 

10.47 

11.53 

12.58 

13.64 

14.69 

15.73 

16.78 

17.82 

18.87 

19.91 

20.95 

21.99 

23.03 

24.07 

25.11 

26.14 

27.18 

28.21 

29.25 

30.28 

31.32 

32.35 

33.38 

34.41 

35.44 

36.47 

37.50 

38.53 

39.56 

40.59 

41.62 





0.7 


1.07 

2.41 

3.66 

4.88 

6.06 

7.23 

8.38 

9.52 
10.66 
11.78 
12.90 
14.01 
15.12 
16.22 
17.32 
18.42 
19.51 
20.60 
21.69 
22.77 
23.86 
24.94 
26.02 
27.10 
28.17 
29.25 
30.32 
31.39 
32.46 
33.53 
34.60 
35.66 
36.73 
37.80 
38.86 
39.92 
40.98 
42.05 
43.11 
44.16 





0.8 


1.64 

3.22 

4.64 

5.99 

7.29 

8.56 

9.80 
11.03 
12.24 
13.44 
14.63 
15.81 
16.98 
18.15 
19.31 
20.47 
21.61 
22.76 
23.90 
25.04 
26.17 
27.30 
28.43 
29.55 
30.68 
31.79 
32.91 
34.03 
35.14 
36.25 
37.36 
38.47 
39.57 
40.68 
41.78 
42.88 
43.98 
45.08 
46.17 
47.27 





0.9 


2.71 

4.61 

6.25 

7.78 

9.24 
10.64 
12.02 
13.36 
14.68 
15.99 
17.28 
18.55 
19.81 
21.06 
22.31 
23.54 
24.77 
25.99 
27.20 
28.41 
29.62 
30.81 
32.01 
33.20 
34.38 
35.56 
36.74 
37.92 
39.09 
40.26 
41.42 
42.58 
43.75 
44.90 
46.06 
47.21 
48.36 
49.51 
50.66 
51.81 





0.95 


3.84 

5.99 

7.81 

9.49 
11.07 
12.59 
14.07 
15.51 
16.92 
18.31 
19.68 
21.03 
22.36 
23.68 
25.00 
26.30 
27.59 
28.87 
30.14 
31.41 
32.67 
33.92 
35.17 
36.42 
37.65 
38.89 
40.11 
41.34 
42.56 
43.77 
44.99 
46.19 
47.40 
48.60 
49.80 
51.00 
52.19 
53.38 
54.57 
55.76 





0.975 


5.02 

7.38 

9.35 
11.14 
12.83 
14.45 
16.01 
17.53 
19.02 
20.48 
21.92 
23.34 
24.74 
26.12 
27.49 
28.85 
30.19 
31.53 
32.85 
34.17 
35.48 
36.78 
38.08 
39.36 
40.65 
41.92 
43.19 
44.46 
45.72 
46.98 
48.23 
49.48 
50.73 
51.97 
53.20 
54.44 
55.67 
56.90 
58.12 
59.34 





0.99 


6.63 

9.21 
11.34 
13.28 
15.09 
16.81 
18.48 
20.09 
21.67 
23.21 
24.72 
26.22 
27.69 
29.14 
30.58 
32.00 
33.41 
34.81 
36.19 
37.57 
38.93 
40.29 
41.64 
42.98 
44.31 
45.64 
46.96 
48.28 
49.59 
50.89 
52.19 
53.49 
54.78 
56.06 
57.34 
58.62 
59.89 
61.16 
62.43 
63.69 





0.995 


7.88 
10.60 
12.84 
14.86 
16.75 
18.55 
20.28 
21.95 
23.59 
25.19 
26.76 
28.30 
29.82 
31.32 
32.80 
34.27 
35.72 
37.16 
38.58 
40.00 
41.40 
42.80 
44.18 
45.56 
46.93 
48.29 
49.64 
50.99 
52.34 
53.67 
55.00 
56.33 
57.65 
58.96 
60.27 
61.58 
62.88 
64.18 
65.48 
66.77 


Example: If X ~ x?(4), the chi-squared distribution on 4 degrees of freedom (df), 
then P(X < 7.78) = 0.9, so qo.9 = 7.78. 





0.999 


10.83 
13.82 
16.27 
18.47 
20.52 
22.46 
24.32 
26.12 
27.88 
29.59 
31.26 
32.91 
34.53 
36.12 
37.70 
39.25 
40.79 
42.31 
43.82 
45.31 
46.80 
48.27 
49.73 
51.18 
52.62 
54.05 
55.48 
56.89 
58.30 
59.70 
61.10 
62.49 
63.87 
65.25 
66.62 
67.99 
69.35 
70.70 
72.05 
73.40 
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Printed in the United Kingdom. 


